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Abstract 

jy^ \ The Graph Motif problem asks whether a given multiset of colors appears on a connected sub- 

^**^ ■ graph of a vertex-colored graph. The fastest known parameterized algorithm for this problem is based on 

a reduction to the fc-Multilinear Detection (k-MhD) problem: the detection of multilinear terms of total 
q ■ degree k in polynomials presented as circuits. We revisit fc-MLD and define /c-CMlD, a constrained 

version of it which reflects GRAPH Motif more faithfully. We then give a fast algorithm for /c-CMlD. 
■ As a result we obtain faster parameterized algorithms for GRAPH Motif and variants of it. 
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22 ■ 1 Introduction 
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\q \ The Graph Motif problem was introduced in [8] in the context of metabolic network analysis. It asks 

whether a given vertex-colored graph contains a connected subgraph whose colors agree with a given mul- 
tiset of colors, the 'motif. In this context motifs are often described as 'functional', to distinguish from 
'topological' motifs, which are fixed subgraphs such as a fc-path. Further applications of functional motif 
discovery have been discussed in fl). Topological motif discover is also known to have many applications, 
in particular in protein networks ||9]. 

Since its introduction, Graph Motif has received significant attention. It is known to be NP-haid even 
when the given graph is a tree of maximum degree 3 and the motif is a set 0J. However in most practical 
cases the size of the targeted motif is relatively small, and so the parameterized version of the problem is 
arguably natural. The problem is fixed parameter tractable [4] and the fastest known (randomized) parame- 
terized algorithm [5] runs in 0*(4 k ) timeQ and polynomial space, where k is the size of the motif. 

The algorithm of Q is based on a reduction to the fc-Multilinear Detection problem (k-MhD) and a sub- 
sequent call of the fastest known algorithm for it ||6l[10]. The fc-MLD problem asks whether a polynomial 
presented as a circuit contains a multilinear term of degree k, when construed as a sum of monomials. Here, 
a circuit is a directed acyclic graph with addition and multiplication gates and terminals corresponding to 
variables. Notably, fc-MLD gives also the fastest known algorithms for finding topological motifs such as 
fc-trees Q. 
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Motivated by GRAPH MOTIF we define the k - Constrained Multilinear Detection problem fc-CMLD. 
fc-CMLD. On input of: 

(i) an arithmetic circuit C representing a polynomial P(X) 

(ii) a mapping x '■ X i-> C of variables X to colors C 

(hi) a mapping n : C \- > N of colors to natural numbers (the multiplicities), 

decide whether P(X) construed as a sum of monomials contains an allowed multilinear monomial of degree 
k. An allowed multilinear monomial is a term that contains at most fi(c) appearances of variables colored 
with color c. 

The technique of (21 extends to the A:-CMlD problem, giving an 0*(4 fc ) time and polynomial space algo- 
rithm. In this paper we show the following. 

Theorem 1.1. The fc-CMLD problem can be solved by a randomized algorithm in 0*(2.54 k ) time and 
polynomial space. 

We stress that this upper bound reflects a worst-case scenario which occurs only when the multiplicities /j,(c) 
are all equal to 3. The running time can be as low as 0*(2 k ) when only O(logn) color multiplicities are 
different than 1. 

We present the algorithm and prove its properties in Section [2] The algorithm is a simple modification of the 
A;-MlD algorithm in iflOl . As a corollary, any problem that is reducible to k-MhD can now be solved with 
additional color constraints in 0*(2.54 fe ) time. This allows the combination of topological and functional 
constraints in motifs, for example by discovering colored fc-paths. In Section[3]we comment on applications 
of A;-CMlD to Graph Motif and variants of it, such as the Multiset Motif, Min-Add, Min-CC and 
Min-Substitute. We obtain a faster parameterized algorithm for each of these problems. 

2 The algorithm and its proof 

Proof. We start by reviewing at a high level the purely algebraic algorithm for fc-MLD in (6l[T0l. The 
algorithm is based on properties of the group algebra Z2[Z|] [6]. The group multiplication over the group 
of fc-dimensional 0-1 vectors is entry- wise addition modulo 2. Consider any A;-dimensional zero vector v, 
and let vq denote the identity of the group, that is the vector. In the group algebra Z2 [Z^], we have 

(vq + v ) 2 = vl + 2v v + v 2 = 2v + 2v = mod 2. 

To understand this identity, note the use of commutativity and of the facts Vq = v 2 = vq, vvq = v. This 
is the annihilation property which -by commutativity- forces all non-multilinear terms of P{X) to go to 
mod 2 if C is evaluated on vectors of the form vq + v. The second important property of this assignment 
is the survival property which is the fact that products of the form 

(vo + vi)(v + v 2 ) ■ ■ ■ (v + v k ) 

evaluate to non-zero if and only if the vectors {y%, . . . , vt} are linearly independent over Z2. In O it is also 
shown that k random 0-1 vectors are linearly independent with probability at least 1/4. So by assigning to 
each variable a random value of the form vq + v makes each multilinear monomial of P(X) evaluate to 
non-zero with probability at least 1/4. 

The second part of the algorithm ifTOl treats the case where a multilinear monomial has an even number 
of copies in P(X). The algorithm first constructs an 'extended' version C of the input circuit C which 
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represents the same polynomial P(X). It then labels each edge incoming to an addition gate in C with 
a different multiplier from a set of additional variables A. By the construction of C and the positions of 
the multipliers, P(X, A) has the isolation property: the coefficient of each multilinear monomial in the 
polynomial P(X, A) is 1. The algorithm then assigns random values from the field F = GF(2 3+log2 k ) to 
the variables in A, values of the form [vq +vi) to the variables in X as described above, and evaluates C over 
F[Z|]. Essentially because each copy of a given multilinear monomial now gets a random multiplier from 
F, there is a good probability that their sum is not mod 2; formally this follows from the Schwartz-Zippel 
Lemma. The output is 'yes' if and only if the output of C is non-zero, an event which happens with constant 
probability. 

We now modify the algorithm in order to solve fc-CMLD. The only change is in the assignment X \- > Z2 [Z*] 
which will now force not only non-multilinear but also 'undesired' multilinear monomials to evaluate to 
mod 2. 

(a) For each c G C pick //(c) random 'basis' vectors from Z|. Let S c denote the subspace over Z2 formed 
by these basis vectors. Repeat if the dimension dim(S c ) of the subspace S c is less than //(c). 

(b) For each x € X pick a random vector v 7^ vq from S c , where c = x( c )- Assign to x the element vq + v. 

These two steps are easily implementable in 0{kn) time and space. The claim for (a) is a corollary of the 
results in j6]. Picking a random vector in (b) can be done by picking a random subset of vectors of the S c 
basis and adding them up. 

It can be seen that with the proposed assignment X i-» Z2[Z^] non-multilinear monomials evaluate to by 
the annihilation property. For the undesired multilinear monomials we will use the survival property but in 
its negative form: the product 

(vo + v{)(vq + v 2 ) ■ ■ ■ (v + v t ) 

is equal to mod 2 if the vectors {v\, . . . , vt} are linearly dependent over Z2. A multilinear term which 
contains more than //(c) variables colored with c will evaluate to simply because the corresponding vectors 
are linearly dependent, as their number exceeds the dimension //(c) of the subspace S c they belong to. In 
summary, we get a constrained annihilation property. 

It remains to show that the assignment has a constrained survival property too. Consider one of the 
'allowed' multilinear terms. In general the term will consist of same-colored groups of variables g\, . . . , g m , 
where group is colored with Cj and contains t{ < //(q) (distinct) variables. A necessary condition for the 
multilinear term to evaluate to non-zero is a joint survival property: for all i, the product of the variables 
in group g$ must evaluate to non-zero. 

Using the analysis in |6lR the subspace survival probability for a group of size t = //(c) > 2 is precisely 
equal to 

t -r^2 t - 2 t_i 

i=l 

We also trivially have p\ = 1. The product ni^i^i i s ec l ua l to the joint survival probability by indepen- 
dence of the events. We claim that, within a constant, the grouping of minimum joint survival probability 
is the one that consists of k/3 groups of size t = 3 each. To see why, note that for all t we have p t > 1/4 
and < 1/4. In addition we have pj > P5P2, Pe > p\, P5 > PzP2- This implies the minimum probability 
grouping cannot contain groups of size larger than 4, because they can be replaced with smaller groups to 
get a grouping with a lower probability. Similarly, p\ > p\ implies that a minimum probability term cannot 

2 The random assignment in [6| doesn't reject the Vo vector. This makes the calculation in [6] slightly different; the denominator 
is 2 J instead of 2* — 1. Here it makes sense rejecting Mo to improve the survival probabilty. 
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contain more than 2 groups of size 4. On the other hand p\ < p| an(1 P2 < Pi imply that the minimum 
probability grouping can't contain more than 2 groups of size 2 or 1. Hence a term with minimum joint 
survival probability must consist of groups of size 3 and at most 2 groups of size 4, 2 and 1. 

It follows that the joint survival probability of any allowed multilinear term is @(p^ 3 ). Conditioned on the 
event of joint survival, the probability that the multilinear term evaluates to non-zero is at least 1/4 by the 
usual survival property. This is because the subspaces are constructed independently at random; indeed the 
proof in (6l looks at the extreme case when all groups are singletons. 

Overall, if P(X) contains an allowed term, evaluating C on the proposed assignment returns a non-zero 

value with probability at least p^ 3 . To boost this probability to a constant value we need to repeat 0((l/ps) k ^ 3 ) 
times. As in |0[TO), each evaluation can be done in polynomial space and 0*(2 k ) time. So the overall run- 
ning time is at most 0*{{S/p^) k ^). After a simple calculation using the precise value for p%, the running 
time turns out being around 0*(2.54 fc ). This finishes the proof. □ 

Even Faster: As it is clear from the proof, the number and sizes of the groups in an allowed multilinear 
term affects its probability of survival. The algorithm is in general faster when fewer color groups occur in 
the allowed term. For example when the multilinear term contains k/10 color groups the algorithm runs in 
0*(2.26 fc ) time. The algorithm can be as fast as 0*(2 k ) when there are O(logn) non-singleton groups (the 
probability of subspace survival is inversely polynomial). 

3 Graph Motif and Variants 

We now present the consequences of our main result for the Graph Motif problem and some variants 
introduced more recently to account for noise in biological data (HO. In most cases the reductions to k- 
CMlD are identical to the reductions of the COLORFUL MOTIF problem to k-MhD given in (21; this is the 
case when every color appears exactly once in the motif. For this reason we omit most of the details. We 
note here that for each problem we state the result for its decision version. In every case the search problem 
can be solved via an easy reduction to the decision problem. 

Theorem 3.1. There is a randomized algorithm for Graph Motif that runs in time 0*(2.54 fc ) and poly- 
nomial space. 

Proof. We assign to each vertex v of the input graph G a variable x v € X. For a (connected) subtree 
T = (Vt,Et) of G we let Qt = ELeVr Xv ~ ^ * s straightforward (as shown in |5] in the algorithm for 
the Colorful Motif problem) to construct a circuit C of size 0{k\G\) with the following properties: 

(i) Each subtree of size k contributes at least one copy of Q(T) in the polynomial P(X) represented by C. 

(ii) Each multilinear monomial of degree k in P{X) is equal to Q(T) for some size-A; subtree T of G. We 
further assign the colors of the nodes to the corresponding variables and set the multiplicities /i(c) according 
to the motif. This gives an instance of fc-CMLD. □ 

In the Graph Motif problem we have that ^2 ceC /i(c) = k. The reduction to fc-CMLD also solves a 
relaxed version of Graph Motif where the equality doesn't have to hold; this is known as Max Graph 
Motif or Multiset Graph Motif (5). As shown in the reduction to A;-MlD can be used to solve 
a relaxation dubbed Graph Motif with Gaps in Q; an alternative but equivalent formulation is called 
MlN Add in [3 ]. Informally this problem asks for a connected subgraph of size r > k that contains a subset 
of the motif, for some specified r. The reduction of JH extends immediately to fc-CMLD, improving the 
running time from 0*(4 fc ) to 0*(2.54 fc ). The same improvement applies to the MlN-CC problem, which 
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asks for a minimum number of connected subgraphs that cover the motif, again via the reduction given 
in (I. 

We finally consider the Min-Substitute problem [3|. Informally this problem asks for a connected 
subgraph with color multiplicities that are allowed to deviate from the motif, but as little as possible. Viewed 
alternatively, the subgraph meets the specifications in the motif but without accounting for p of its nodes, 
where p is as small as possible. The fastest known algorithm runs in time 0*((3e) fc ) 0. We give an 
improved algorithm, which is not a direct reduction to /c-CMlD but uses elements from it. 

Theorem 3.2. There is a randomized algorithm for Min-Substitute that runs in O*(5.08 fc ) time and 
polynomial space. 

Proof. Let P(X, A) be the 'extended' polynomial from the proof of Theorem 13.11 recall that it encodes 
the instance of the Max Motif for motifs of size k. We will introduce an extra set of variables Y that are 
in 1-1 correspondence with the variables of X. We introduce them in the circuit by replacing each terminal 
Xi with the gate ai^-Xj + a2 ) iUi, where a^j, 02,% are fresh variables from the special set A. Note then that 
each multilinear term x^ ■ ■ ■ Xi k generates 2 k multilinear terms (now in the variables (X, Y)). However the 
use of multipliers aij, Gt2,i ensures the isolation property, exactly as described in the proof of Theorem ll.il 
We will assign to the variables in A random values from F = GF(2 logfc+5 ). The argument of ifTOl applies 
identically. 

Having fixed the assignment to A, we evaluate P(A, X, Y) on a slightly more involved assignment over 
Z2[Z2 fc ]. Let {vq + v x .) be the assignment to the variable X{ in the proof of Theorem ll.il recall that this is a 
/c-dimensional vector. Each variable Xi gets assigned an element of the form 

(vo + v x )(v + w x .) 

and the corresponding variable yi gets assigned z(vq + w Xi ), where: 

(i) v Xi is the vector v Xi padded with zeros in the lower k coordinates. 

(ii) w Xi is a vector with zeros in the upper k coordinates and a random 0-1 vector in the lower k coordinates. 

(iii) z is a free indeterminate that will remain unevaluated and will be handled symbolically. 

We will evaluate the extended circuit over F[z] [Z^]. Here the coefficients of the vectors in Z| are univariate 
polynomials from ¥[z]. Note that every non-multilinear term in P(X) gives P(A,X,Y) terms that are 
multiples of xf or yf or xiyi. When evaluated, these three monomials are multiples of (vq + w Xi ) 2 = 0. 
So, roughly speaking, the lower k coordinates always enforce the annihilation property. Also note that the 
factors that use vectors non-zero in the upper k coordinates enforce the constrained annihilation property, 
by construction. Assume however that a multilinear term x\ x • • • Xi k doesn't meet the multiplicities of the 
motif, unless it can avoid accounting for the multiplicities contributed by p of its variables/vertices (wlog the 
first p). Then the 'upper coordinates' for the multilinear term ■ ■ ■ Ui p Xi p+1 • • • Xi h evaluate to a non-zero 
value multiplied by z p , with probability at least 0((p^) k ^ 3 ), as in the proof of Theorem ll.il This is because 
the p variables are essentially dropped from the k upper coordinates that enforce constrained annihilation. 
Independently from that, the lower coordinates evaluate to non-zero with probability at least 1/4, by the 
survival property. Hence the coefficient of z p is non-zero (with some probability) if and only if there is a 
connected subgraph that exceeds by p the motif multiplicities. So, finding the smallest such p solves the 
problem. The evaluation of the circuit can be done in 0*(4 k ) time and polynomial space, and we need 
0((l/p3) fc / 3 ) evaluations. This finishes the proof. □ 
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